-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[import] Separate dataset import logic into separate modules #171
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a minor question, otherwise it looks really great!
// Create batches of documents to import. Try to keep batches below a certain | ||
// byte-size (since document may vary greatly in size depending on type etc) | ||
const batches = batchDocuments(weakened) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not clear to me, so just checking: importBatches
, uploadAssets
and strengthenReferences
must be done in sequence, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. importBatches
must be done first, but uploadAssets
and strengthenReferences
could be done in parallel. The reason why we're not doing this at the moment is that if one of these operations fail, there is no way to stop the other. I've made a note of it in the readme as a definite improvement for the future, but it would require using something like a queue instead of simply mapping over items with a concurrency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the clarification! And +1 for improved error handling in the future. If strengthenReferences
fail right now, it will result in dangling assets too, or are the asset uploads idempotent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asset uploads are done first and connected to their corresponding documents. If this operation fails, it will have dangling assets, however. When re-running the import, it should find the same assets already uploaded and reuse those, however.
Failures on strengthened references will also leave some references as weak, and this is a situation which might easily occur if you are refering to documents which do not exist (incorrect IDs or similar). I've also made a note of this in the readme, that we should attempt to check if referenced documents actually exist before starting upload.
All in all, "rolling back" a failed upload is pretty hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When re-running the import, it should find the same assets already uploaded and reuse those, however.
This is great. I have no further comments 👩⚖️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. I have no further comments 👩⚖️
Achievement unlocked.
This PR removes the dataset import logic from
@sanity/core
and moves it into a separate module called@sanity/import
. This makes it much easier to maintain and write tests for, and makes it much more reusable.In the process, I've rewritten a lot (most) of the logic to make it much easier to work with and debug, with the tradeoff that it will use more memory and be slightly slower.
I've also made a separate CLI tool that is installable which will only handle the dataset import logic,
@sanity/import-cli
.In the process of rewriting, a few major improvements have been made:
_type
field if not setThere are a couple more things that we should do to improve the import at some point, which I've outlined in the readme, but for the most part this should now work better and more reliably than the old import.
Sorry about the humongous size of the PR.